Add QNN EP HTP shared memory allocator#23136
Merged
Conversation
… declarations and definitions for IAllocator::TensorAlloc().
…ion clean up callback
edgchen1
commented
Jan 11, 2025
yuslepukhin
reviewed
Jan 13, 2025
Member
|
Can this be used for Lora support when the model is modified to have optional inputs, and the data can be fed to override default initializers? |
yuslepukhin
reviewed
Jan 13, 2025
yuslepukhin
reviewed
Jan 13, 2025
Contributor
Author
I'm not too familiar with the scenario. If that can be done using OrtValues, an input OrtValue can use this new allocator. |
baijumeswani
approved these changes
Jan 13, 2025
skottmckay
approved these changes
Jan 13, 2025
adrianlizarraga
approved these changes
Jan 14, 2025
Contributor
Author
|
/azp run Windows GPU WebGPU CI Pipeline |
|
Azure Pipelines successfully started running 1 pipeline(s). |
guschmue
pushed a commit
that referenced
this pull request
Mar 6, 2025
Adds QNN EP HTP shared memory allocator. The HTP shared memory allocator (`HtpSharedMemoryAllocator`) calls the rpcmem shared library (libcdsprpc.so/dll) to allocate and free memory that can be shared between HTP and CPU. The allocator can be enabled by setting QNN EP option `enable_htp_shared_memory_allocator` to `1`. `QNNExecutionProvider::CreatePreferredAllocators()` will then return an instance of `HtpSharedMemoryAllocator`. For each QNN context, we also need to register and unregister memory handles in order to use the HTP shared memory. This memory handle management is added to `QnnBackendManager`, which also manages the QNN context handles. For more information about using HTP shared memory with QNN, see: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_shared_buffer_tutorial.html#shared-buffer-tutorial Limitations: - HTP shared memory usage is only supported for graph inputs and outputs. Intermediate values are not supported. - An allocation is assigned to a single shared memory buffer. The allocator is not smart enough to have multiple allocations share a single shared memory buffer. Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
ashrit-ms
pushed a commit
that referenced
this pull request
Mar 17, 2025
Adds QNN EP HTP shared memory allocator. The HTP shared memory allocator (`HtpSharedMemoryAllocator`) calls the rpcmem shared library (libcdsprpc.so/dll) to allocate and free memory that can be shared between HTP and CPU. The allocator can be enabled by setting QNN EP option `enable_htp_shared_memory_allocator` to `1`. `QNNExecutionProvider::CreatePreferredAllocators()` will then return an instance of `HtpSharedMemoryAllocator`. For each QNN context, we also need to register and unregister memory handles in order to use the HTP shared memory. This memory handle management is added to `QnnBackendManager`, which also manages the QNN context handles. For more information about using HTP shared memory with QNN, see: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_shared_buffer_tutorial.html#shared-buffer-tutorial Limitations: - HTP shared memory usage is only supported for graph inputs and outputs. Intermediate values are not supported. - An allocation is assigned to a single shared memory buffer. The allocator is not smart enough to have multiple allocations share a single shared memory buffer. Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
HectorSVC
pushed a commit
that referenced
this pull request
Apr 26, 2025
…24196) ### Description During inference, using the QNN EP option to set enable_htp_shared_memory_allocator gives a hint that we use RPC allocated buffers to avoid buffer copy between CPU and NPU. With the current PR, we add hints in the compilation phase that if RPC memory is going to be used, any additional allocations done on the CPU can be avoided. ### Motivation and Context This should help reduce the peak CPU memory consumption while running AI work loads using shared memory. Related PR: #23136 Co-authored-by: Ashish Garg (AISW) <ashigarg@qti.qualcomm.com>
ankitm3k
pushed a commit
to intel/onnxruntime
that referenced
this pull request
May 12, 2025
…icrosoft#24196) ### Description During inference, using the QNN EP option to set enable_htp_shared_memory_allocator gives a hint that we use RPC allocated buffers to avoid buffer copy between CPU and NPU. With the current PR, we add hints in the compilation phase that if RPC memory is going to be used, any additional allocations done on the CPU can be avoided. ### Motivation and Context This should help reduce the peak CPU memory consumption while running AI work loads using shared memory. Related PR: microsoft#23136 Co-authored-by: Ashish Garg (AISW) <ashigarg@qti.qualcomm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds QNN EP HTP shared memory allocator.
The HTP shared memory allocator (
HtpSharedMemoryAllocator) calls the rpcmem shared library (libcdsprpc.so/dll) to allocate and free memory that can be shared between HTP and CPU.The allocator can be enabled by setting QNN EP option
enable_htp_shared_memory_allocatorto1.QNNExecutionProvider::CreatePreferredAllocators()will then return an instance ofHtpSharedMemoryAllocator.For each QNN context, we also need to register and unregister memory handles in order to use the HTP shared memory. This memory handle management is added to
QnnBackendManager, which also manages the QNN context handles.For more information about using HTP shared memory with QNN, see: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_shared_buffer_tutorial.html#shared-buffer-tutorial
Limitations:
Motivation and Context
Improve performance by using HTP shared memory to avoid overhead from copying data between CPU and NPU.